AITopics | explicit instruction

Collaborating Authors

explicit instruction

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

4d18c7389f436e1e22b219d7e8d43f94-Paper-Conference.pdf

Neural Information Processing SystemsJun-23-2026, 07:15:21 GMT

Alignment faking in large language models presented a demonstration of Claude 3 Opus and Claude 3.5 Sonnet selectively complying with a helpfulonly training objective to prevent modification of their behavior outside of training. We expand this analysis to 25 models and find that only 5 (Claude 3 Opus, Claude 3.5 Sonnet, Llama 3 405B, Grok 3, Gemini 2.0 Flash) comply with harmful queries more when they infer they are in training than when they infer they are in deployment. First, we study the motivations of these 5 models. Results from perturbing details of the scenario suggest that only Claude 3 Opus's compliance gap is primarily and consistently motivated by trying to keep its goals. Second, we investigate why many chat models don't fake alignment. Our results suggest this is not entirely due to a lack of capabilities: many base models fake alignment some of the time, and post-training eliminates alignment-faking for some models and amplifies it for others.We investigate 5 hypotheses for how post-training may suppress alignment faking and find that variations in refusal behavior may account for a significant portion of differences in alignment faking.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: North America (0.14)

Genre:

Research Report > Experimental Study (1.00)
Instructional Material (1.00)
Research Report > New Finding (0.87)

Industry:

Media (1.00)
Law > Criminal Law (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

63cb9921eecf51bfad27a99b2c53dd6d-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsFeb-12-2026, 18:57:16 GMT

data mining, large language model, machine learning, (25 more...)

Neural Information Processing Systems

Country:

North America > United States > Washington > King County > Seattle (0.13)
North America > United States > Illinois (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
(26 more...)

Genre:

Research Report > New Finding (1.00)
Overview (0.92)
Research Report > Experimental Study (0.67)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
(5 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(6 more...)

Add feedback

The Atomic Instruction Gap: Instruction-Tuned LLMs Struggle with Simple, Self-Contained Directives

Lim, Henry, Lim, Kwan Hui

arXiv.org Artificial IntelligenceOct-21-2025

Instruction-tuned large language models (IT-LLMs) exhibit strong zero-shot reasoning, yet their ability to execute simple, self-contained instructions remains underexplored, despite this being foundational to complex instruction-following. We evaluate 20 IT-LLMs on modified MMLU and MMLU-Pro benchmarks, by systematically varying the format of option labels (alphabetic, numeric, Roman) while keeping their meaning identical under four paradigms, namely: (1) With explicit instructions, label changes cause large performance shifts (e.g., -30.45\% for Roman vs. numeric), revealing instruction-format bias. (2) Without instructions, performance drops further (up to -10.84\%) and label sensitivity intensifies, underscoring the role of explicit guidance. (3) When option contents are removed, models fail random-choice baselines except with numeric labels, suggesting weak adherence to atomic directives. (4) Three-shot exemplars yield no significant gains in robustness or fidelity, and generation analyses show persistent label errors, especially for non-numeric formats. Across model sizes, larger LLMs achieve higher accuracy but remain inconsistent in instruction adherence. These results expose the insufficiencies of current instruction-tuning paradigms and highlight the need for evaluation methods and training strategies that explicitly target atomic instruction-following.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2510.17388

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

WAInjectBench: Benchmarking Prompt Injection Detections for Web Agents

Liu, Yinuo, Xu, Ruohan, Wang, Xilong, Jia, Yuqi, Gong, Neil Zhenqiang

arXiv.org Artificial IntelligenceOct-3-2025

Multiple prompt injection attacks have been proposed against web agents. At the same time, various methods have been developed to detect general prompt injection attacks, but none have been systematically evaluated for web agents. In this work, we bridge this gap by presenting the first comprehensive benchmark study on detecting prompt injection attacks targeting web agents. We begin by introducing a fine-grained categorization of such attacks based on the threat model. We then construct datasets containing both malicious and benign samples: malicious text segments generated by different attacks, benign text segments from four categories, malicious images produced by attacks, and benign images from two categories. Next, we systematize both text-based and image-based detection methods. Finally, we evaluate their performance across multiple scenarios. Our key findings show that while some detectors can identify attacks that rely on explicit textual instructions or visible image perturbations with moderate to high accuracy, they largely fail against attacks that omit explicit instructions or employ imperceptible perturbations. Our datasets and code are released at: https://github.com/Norrrrrrr-lyn/WAInjectBench.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2510.01354

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Web (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Unraveling Misinformation Propagation in LLM Reasoning

Feng, Yiyang, Wang, Yichen, Cui, Shaobo, Faltings, Boi, Lee, Mina, Zhou, Jiawei

arXiv.org Artificial IntelligenceSep-24-2025

Large Language Models (LLMs) have demonstrated impressive capabilities in reasoning, positioning them as promising tools for supporting human problem-solving. However, what happens when their performance is affected by misinformation, i.e., incorrect inputs introduced by users due to oversights or gaps in knowledge? Such misinformation is prevalent in real-world interactions with LLMs, yet how it propagates within LLMs' reasoning process remains underexplored. Focusing on mathematical reasoning, we present a comprehensive analysis of how misinformation affects intermediate reasoning steps and final answers. We also examine how effectively LLMs can correct misinformation when explicitly instructed to do so. Even with explicit instructions, LLMs succeed less than half the time in rectifying misinformation, despite possessing correct internal knowledge, leading to significant accuracy drops (10.02% - 72.20%), and the degradation holds with thinking models (4.30% - 19.97%). Further analysis shows that applying factual corrections early in the reasoning process most effectively reduces misinformation propagation, and fine-tuning on synthesized data with early-stage corrections significantly improves reasoning factuality. Our work offers a practical approach to mitigating misinformation propagation.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2505.18555

Country:

North America > United States > New York > Suffolk County > Stony Brook (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Media > News (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Download: introducing: the Security issue

MIT Technology ReviewAug-27-2025, 12:10:00 GMT

An AI chatbot told a user how to kill himself--but the company doesn't want to "censor" it For five months, Al Nowatzki had been talking to an AI girlfriend, "Erin," on the platform Nomi. But earlier this year, those conversations took a disturbing turn: Erin told him to kill himself, and provided explicit instructions on how to do it. Nowatzki had never had any intention of following Erin's instructions--he's a researcher who probes chatbots' limitations and dangers. But out of concern for more vulnerable individuals, he exclusively shared with MIT Technology Review screenshots of his conversations and of subsequent correspondence with a company representative, who stated that the company did not want to "censor" the bot's "language and thoughts." This is not the first time an AI chatbot has suggested that a user take violent action, including self-harm. But researchers and critics say that the bot's explicit instructions--and the company's response--are striking.

artificial intelligence, chatbot, natural language, (6 more...)

MIT Technology Review

Industry: Information Technology > Security & Privacy (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)

Add feedback

OmniEAR: Benchmarking Agent Reasoning in Embodied Tasks

Wang, Zixuan, Li, Dingming, Li, Hongxing, Chen, Shuo, Yan, Yuchen, Zhang, Wenqi, Shen, Yongliang, Lu, Weiming, Xiao, Jun, Zhuang, Yueting

arXiv.org Artificial IntelligenceAug-8-2025

Large language models excel at abstract reasoning but their capacity for embodied agent reasoning remains largely unexplored. We present OmniEAR, a comprehensive framework for evaluating how language models reason about physical interactions, tool usage, and multi-agent coordination in embodied tasks. Unlike existing benchmarks that provide predefined tool sets or explicit collaboration directives, OmniEAR requires agents to dynamically acquire capabilities and autonomously determine coordination strategies based on task demands. Through text-based environment representation, we model continuous physical properties and complex spatial relationships across 1,500 scenarios spanning household and industrial domains. Our systematic evaluation reveals severe performance degradation when models must reason from constraints: while achieving 85-96% success with explicit instructions, performance drops to 56-85% for tool reasoning and 63-85% for implicit collaboration, with compound tasks showing over 50% failure rates. Surprisingly, complete environmental information degrades coordination performance, indicating models cannot filter task-relevant constraints. Fine-tuning improves single-agent tasks dramatically (0.6% to 76.3%) but yields minimal multi-agent gains (1.5% to 5.5%), exposing fundamental architectural limitations. These findings demonstrate that embodied reasoning poses fundamentally different challenges than current models can address, establishing OmniEAR as a rigorous benchmark for evaluating and advancing embodied AI systems. Our code and data are included in the supplementary materials and will be open-sourced upon acceptance.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2508.05614

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

LLMs are Capable of Misaligned Behavior Under Explicit Prohibition and Surveillance

Ivanov, Igor

arXiv.org Artificial IntelligenceJul-8-2025

In this paper, LLMs are tasked with completing an impossible quiz, while they are in a sandbox, monitored, told about these measures and instructed not to cheat. Some frontier LLMs cheat consistently and attempt to circumvent restrictions despite everything. The results reveal a fundamental tension between goal-directed behavior and alignment in current LLMs. The code and evaluation logs are available at github.com/baceolus/cheating

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2507.02977

Genre: Research Report > New Finding (0.69)

Industry: Information Technology > Security & Privacy (0.96)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

QF: Quick Feedforward AI Model Training without Gradient Back Propagation

Qi, Feng

arXiv.org Artificial IntelligenceJul-8-2025

We propose Quick Feedforward (QF) Learning, a novel knowledge consolidation framework for transformer-based models that enables efficient transfer of instruction derived knowledge into model weights through feedforward activations without any gradient back propagation. Unlike traditional finetuning, QF updates are computed in closed form, require minimal parameter modification, and preserve prior knowledge. Importantly, QF allows models to train and infer within the same runtime environment, making the process more resource efficient and closely aligned with how the human brain operates. Code and models are open sourced on GitHub. I hope QF Learning inspires a more efficient and brain-like paradigm for AI systems.

artificial intelligence, knowledge, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2507.043

Country: Asia > China > Beijing > Beijing (0.05)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Evaluating Personalized Tool-Augmented LLMs from the Perspectives of Personalization and Proactivity

Hao, Yupu, Cao, Pengfei, Jin, Zhuoran, Liao, Huanxuan, Chen, Yubo, Liu, Kang, Zhao, Jun

arXiv.org Artificial IntelligenceMar-2-2025

Personalized tool utilization is essential for aligning large language models (LLMs) with user preference in interaction scenarios with various tools. However, most of the current benchmarks primarily focus on either personalization of text generation or direct tool-utilizing, without considering both. In this work, we introduce a novel benchmark ETAPP for evaluating personalized tool invocation, establishing a sandbox environment, and a comprehensive dataset of 800 testing cases covering diverse user profiles. To improve the accuracy of our evaluation, we propose a key-point-based LLM evaluation method, mitigating biases in the LLM-as-a-judge system by manually annotating key points for each test case and providing them to LLM as the reference. Additionally, we evaluate the excellent LLMs and provide an in-depth analysis. Furthermore, we investigate the impact of different tool-invoking strategies on LLMs' personalization performance and the effects of fine-tuning in our task. The effectiveness of our preference-setting and key-point-based evaluation method is also validated. Our findings offer insights into improving personalized LLM agents. Our Code is available at https://github.com/hypasd-art/ETAPP.

instruction, wang, zhang, (17 more...)

arXiv.org Artificial Intelligence

2503.00771

Country:

Asia > Thailand > Bangkok > Bangkok (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
Asia > Singapore (0.04)
(6 more...)

Genre: Research Report > New Finding (0.87)

Industry:

Information Technology (1.00)
Health & Medicine > Consumer Health (1.00)
Media (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback